Information Structure and Hypertext Search Results
نویسنده
چکیده
This paper proposes a framework for integrating a principled theory of information structure with traditional web-based search presentation techniques, and sketches the benefits of such an approach. It begins by reviewing research that has been done in the technical writing industry on on-line information structure, highlighting some of the difficulties and challenges faced by developers and users of hypertext-based documentation. It then introduces Peircean categories as a natural classification system for on-line information, with the hypothesis that such a classification can solve some of the fundamental problems faced by information researchers, and can unify several aspects of the eclectic research findings of the technical writing industry. A summary is given of an empirical investigation which gives support for the classification approach. A sketch of the proposed information typology is given, showing how the Peircean categories explain the otherwise confusing intricacies of technical information phenomena. Finally, possible applications to the issue of structuring search results in a consistent and useful manner are mentioned. Information types and current research Much of the technical documentation being written in the software industry today is intended for on-line viewing. Technical writers typically divide their documentation into discrete modules and then link the modules together to form a hypertext information system. For large documentation projects, the technical writing team may decide ahead of time to use specific types of information modules. For example, they may decide to always present ‘overview’ and ‘procedural’ information in separate modules, each with a distinct template and layout. The intent is to ensure a consistent manner of presenting information (despite the fact that the documentation has multiple authors) and to ensure that readers have a consistent experience using the documentation. The information types used in a project form the basis for how documentation is divided up, organized, and assembled by writers. Unfortunately, coming up with a truly useful set of information types is an ongoing and costly task for most technical writing organizations. Even when consensus is reached and a set of information types is established, the technological and organizational landscape of the company can change to such an extent before implementation is possible, so as to render earlier work on information types unusable. Thus, in the business world it is sometimes the case that progress on a general theory and use of information types takes a back seat to the more pressing, practical concerns of specific projects and technology transitions. It has been noted that a theoretical foundation is sorely lacking in many information-structuring applications (Johnson, 1989:26). Accordingly, a literature review was carried out which confirmed this observation and which highlighted some unresolved issues in the research. The approach proposed in this paper, a general theory of information types, is meant to provide a principled foundation for structuring information in a wide range of applications, including web-based search. A review of existing information typologies (two used by Novell, Inc. for specific technical writing projects and another used by Information Mapping, Inc. in its technical writing seminars) shows that these typologies suffer from a lack of both adequate distinctions and adequate correlations among the various categories in the typologies (Carmack 2000:1-20). For example, one Novell typology (1995) combines general overviews, process descriptions, and feature descriptions into a single Overview category. The other Novell typology (1997) places two similar kinds of command descriptions into completely different categories. The Information Mapping typology (1996) sets forth six distinct information categories, but several of the categories contain elements that correlate strongly with elements in other categories. For example, the Concept category contains abstract definitions and concrete examples that correlate with the abstract syntax descriptions and concrete hardware drawings in the Structure category, but these correlations are not acknowledged or explained by the information typology. A review of information typology research published in technical writing forums and societies (Carmack 2000:20-49) shows that the most fundamental typological division that technical readers and writers make is that between procedural and conceptual information. Empricial studies show that procedural information impacts the performance of readers' current tasks only, while conceptual information impacts readers' future tasks only (Ummelen 1997:289). Procedural information is shown to use direct, literal prose and to cast the reader as the agent of most actions (using the implied ‘you’), while conceptual information is shown to use longer prose, to cast technology as the agent of most actions, and to use relatively more modals and modifiers. The research also identifies decision-support information as an important category that weaves together both procedural and conceptual information for the purpose of helping readers respond to unexpected problems and situations. This information type is shown to branch the reader to different paths depending on various conditions, and (like conceptual information) to cast technology as the agent of most actions. To a lesser degree, the research describes other, more specialized information types such as lists and examples. Lists are shown to help the reader scan items of information when trying to find a match with an item that the reader has in mind, while examples are shown to create materialized instances in the reader’s mind that convey the meaning of abstract concepts. Finally, the research presents alternative theories for categorizing technical information typologically based on either different kinds of ‘reader roles’ or different kinds of ‘human-machine interactions’. The ‘reader role’ theory (Simpson 1989:86) categorizes information based on whether the information caters to a reader who is acting in a learner, doer, or searcher role. The ‘human-machine interaction’ theory (Rasmussen
منابع مشابه
Kontextsensitive Visualisierung von Suchergebnissen
The project ,Virtuell Information Spaces` deals with the problem of distributed, structured information spaces like electronic market places. The project will support information resource selection, the specific search process, and presentation of search results. In this paper we will introduce a specific method for visualizing search results being embedded in the hypertext structure of an elec...
متن کاملUsing Hypertext Composites in Structured Query and Search
This paper describes a part of our effort to answer the open question how to use the structural and semantic information that is representable with the new Web standards efficiently for searching the Internet and filtering and retrieving relevant information. Link-based hypertext composites are applied in formulating structured queries and deriving structured search results. By enabling users t...
متن کاملAutomatic Hypertext Keyphrase Detection
This paper describes initial experiments in applying knowledge derived from hypertext structure to domain-specific automatic keyphrase extraction. It is found that hyperlink information can improve the effectiveness of automatic keyphrase extraction by 50%. However, the primary goal of this project is to apply similar techniques to information retrieval tasks such as web searching. These initia...
متن کاملThe Anatomy of a Large-Scale Hypertextual Web Search Engine
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To ...
متن کاملReprint of: The anatomy of a large-scale hypertextual web search engine
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To ...
متن کاملThe Hypertext Internet Connection: E-mail, Online Search, Gopher
In this paper we show how to handle and organize the large amount of information accessible through the Internet or other public communication networks in a hypertext environment. The C(K)onstance-Hypertext-System (KHS) uses typed units to indicate the differences and the content and structure of information, comprising text, forms, images pointers to external information. We show how to imbed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000